Classification of white blood cells (leucocytes) from blood smear imagery using machine and deep learning models: A global scoping review

Machine learning (ML) and deep learning (DL) models are being increasingly employed for medical imagery analyses, with both approaches used to enhance the accuracy of classification/prediction in the diagnoses of various cancers, tumors and bloodborne diseases. To date however, no review of these techniques and their application(s) within the domain of white blood cell (WBC) classification in blood smear images has been undertaken, representing a notable knowledge gap with respect to model selection and comparison. Accordingly, the current study sought to comprehensively identify, explore and contrast ML and DL methods for classifying WBCs. Following development and implementation of a formalized review protocol, a cohort of 136 primary studies published between January 2006 and May 2023 were identified from the global literature, with the most widely used techniques and best-performing WBC classification methods subsequently ascertained. Studies derived from 26 countries, with highest numbers from high-income countries including the United States (n = 32) and The Netherlands (n = 26). While WBC classification was originally rooted in conventional ML, there has been a notable shift toward the use of DL, and particularly convolutional neural networks (CNN), with 54.4% of identified studies (n = 74) including the use of CNNs, and particularly in concurrence with larger datasets and bespoke features e.g., parallel data pre-processing, feature selection, and extraction. While some conventional ML models achieved up to 99% accuracy, accuracy was shown to decrease in concurrence with decreasing dataset size. Deep learning models exhibited improved performance for more extensive datasets and exhibited higher levels of accuracy in concurrence with increasingly large datasets. Availability of appropriate datasets remains a primary challenge, potentially resolvable using data augmentation techniques. Moreover, medical training of computer science researchers is recommended to improve current understanding of leucocyte structure and subsequent selection of appropriate classification models. Likewise, it is critical that future health professionals be made aware of the power, efficacy, precision and applicability of computer science, soft computing and artificial intelligence contributions to medicine, and particularly in areas like medical imaging.


Introduction
White blood cells (WBCs) play a vital role in the human immune system.They identify and neutralize pathogens including bacteria, viruses, and cancer cells.Classification of WBCs is therefore vital for accurate and early diagnosis and treatment of a range of diseases and medical conditions [1].Machine learning techniques, both traditional and deep, have been widely adopted for myriad applications, including medical image analysis (MIA).MIA is critical in modern healthcare systems, aiding medical professionals in making well-informed decisions.It is currently used to diagnose brain tumors, lung cancer, anemia, leukemia, and malaria, via a range of image modalities including Magnetic Resonance Imaging (MRI), Computed Tomography (CT-Scans), Ultrasounds, Positron Emission Tomography (PET), Blood Smear images, and hybrid modalities [2].Accordingly, MIA has attracted significant attention from computer vision experts, with traditional and deep machine learning techniques having been applied in leukocyte segmentation, cancer detection, classification, medical image annotation, and image retrieval in computer-aided diagnosis (CAD).The efficacy of these methods therefore directly influences clinical diagnosis and treatment strategies, highlighting the significance of technological advancements, such as high-speed computational resources and improved hardware and storage capabilities for CAD [3][4][5].One of the primary application areas for CAD systems using traditional machine learning and deep learning is segmentation and classification of leukocytes (WBCs).Leukocytes provide valuable information to medical professionals (doctors, hematologists, pathologists, and radiologists), for diagnosing various bloodrelated issues, including Human Immunodeficiency Virus (HIV) and blood cancer (leukemia).Changes in the WBC count and/or morphological cell alterations, for instance variations in size, shape, and color observed in blood smear images, can provide valuable insights into various health disorders [6][7][8][9].
Blood cells are categorized into three major types: WBCs (leukocytes), red blood cells (erythrocytes), and platelets (thrombocytes).Leukocytes are subdivided into five types: monocytes, lymphocytes, neutrophils, basophils, and eosinophils (Fig 1).Over the past two decades, https://doi.org/10.1371/journal.pone.0292026.g001significant advances have been made in traditional ML and DL methods for classification and segmentation of WBCs in microscopic blood smear images.Conventional methods depend on manually analyzing these images using microscopy, which is typically slow, laborious and error prone.Thus, development of automated and computer-aided systems has become crucial in accurate, systematic, unbiased and rapid clinical diagnosis and effective treatment.Automated analysis of WBCs in blood smear images can significantly reduce the workload of hematologists and provide fast, accurate, and efficient results to assist medical professionals in the diagnostic process [10][11][12][13].There are two overarching methods typically used to achieve automated WBC classification in blood smear images: traditional machine learning (ML) and deep learning (DL) techniques.These techniques have the potential to make medical hematology more efficient.A generalized overview of machine learning and deep learning techniques used to classify WBCs is presented in Fig 2 .The traditional machine learning process involves interconnected steps such as segmenting the region of interest and extracting features, followed by optimal classification [14,15].The feature extraction phase in traditional machine learning methods is challenging and directly impacts classification performance.More recently, deep learning approaches are increasingly used due to higher performance and decreasing complexity.Advanced deep learning methods with transfer learning have further improved implementation of automated systems for classification of WBCs.Notwithstanding the importance of ML and DL in medical image analysis (MIA), a gap remains in white blood cell classification via blood smear imagery; to date, no global review of these approaches is available in the published scientific literature.Accordingly, the present study sought to comprehensively identify and synthesize ML and DL methods, focusing on classifying five white blood cell types, and present this in concurrence with an overview of recommended future work, challenges and limitations associated with the identified approaches.

Review protocol
A well-organized and formally structured review process is essential to identify, scan, include/ exclude and synthesize targeted literature which satisfies preexisting search criteria and effectively employs existing resources [16].In the current review, the authors sought to incorporate the most recent and relevant research articles based on manual and automatic searches to identify all significant content.The approach was initiated by identifying pertinent research questions.The two research questions (RQ) formulated in accordance with the PCC (Population, Concept, Context) search framework are as follows: i.How have systems been developed for classification of WBCs based on ML and DL?
ii.What are the applications of traditional machine learning and deep learning methods for effectively classifying WBCs in blood smear images?
Relevant studies were identified using specific keywords extracted from the research questions (Table 1).These keywords covered various aspects, including segmentation, classification and detection of WBCs.The study explored machine learning techniques, involving both traditional and advanced deep learning methods.The research recognized the importance of big data and employed artificial intelligence (AI), as indicated by keywords like "Big data" and "Artificial Intelligence" respectively.This careful selection of keywords ensured a focused and comprehensive search across databases, resulting in retrieving relevant data for the study.
The next review phase after RQ development was identification of relevant articles/studies via automated searching of electronic databases based on extracted keywords from RQs (Iterative combinations of ((A1 -A4) x (B1 -B4)) from Table 1).Articles published from 2006 to May 2023 were included for review.To align with the study's emphasis on recent research trends and technological progress, articles prior to 2006 were omitted.Research articles were located from three repositories including Google Scholar, Scopus, and Web of Science.The inclusion and exclusion criteria are presented in Table 2.
Overall, a total of 3750 articles from Google Scholar, Web of Science, and Scopus were identified (Fig 3).Following deduplication, this collection decreased to 2210 articles.Based on a thorough evaluation of article titles, abstracts and included data (from methodology section and Appendices), a further 2075 articles were excluded from further consideration.The final article cohort includes only articles published in English between January 1 st 2006 and May 31 st 2023, and independently adjudged (2 x groups of 2 authors) by the author team as being directly relevant to the topic (Table 2).Quality assessment of included research papers, while not strictly considered necessary for scoping reviews, is critical in assessing literature consistency, validity, and overall credibility [18].Accordingly, the authors employed a non-summative 5-point quality system adapted from Wylde et al (2017).Our tool consisted of five items used to assess 1. relevance to scoping review question (based on full paper review), 2. selection bias (i.e., input data sources provided), 3. transferability (open-source data usage, open code), 4. bias due to missing data and/or lack of clarity, and 5. consideration of analytical confounding, model overfitting and/or study limitations.Each item was rated as adequate, inadequate or not reported, with only articles attributed as being "adequate" across all five criteria adjudged to be acceptable quality for narrative inclusion and data synthesis.5).Highincome countries were, perhaps unsurprisingly, well represented, likely due to the availability of large datasets for training and testing, in addition to increasingly mature/well-funded national healthcare systems.As shown (Fig 6), 8 overarching model architectures and methods were employed for classification, including both traditional machine learning and deep learning models.Traditional machine learning models included Decision Trees (DT), K-means, Naive Bayes Classifier (NBC), Nearest Neighbor Classifier (NNC), Support Vector Machines (SVM), Artificial Neural Networks (ANN), and thresholding techniques.Within the deep learning domain, convolutional neural networks were the most frequently employed approach, likely due to their high performance and accuracy (compared and presented in Sections 3.2-3.6).In total, 27 datasets were specifically referenced across the identified relevant studies ranging from 21 images [88] to 92,800 images [73] (Table 3), including ALL-IDB, one Private Dataset [60], CellaVision, AA-IDB2, Hayatabad Medical Dataset, Isfahan Al-Zahra and Omid  3. Just two studies [6] specifically referred to the use of thin blood smear images, with the remaining studies either expressly referring to the use of thick blood smears or not reporting on smear type; this is notable, as thick features have inherent advantages over thin features in WBC classification outcomes.

White blood cell classification using conventional machine learning
Various studies have explored conventional machine learning methods for WBC classification, which for the purpose of clarity, the authors have organized into pre-processing-based techniques (Section 3.3.1),feature extraction (Section 3.3.2),and classification (Section 3.3.3).A total of 39 studies were identified, with 13 papers (33.3%) focused on pre-processing techniques, 15 papers (38.5%) delving into feature extraction methods, and 11 papers (28.2%) emphasizing classification techniques for WBC classification.This distribution of approaches and objectives is evident in Tables 4-7, highlighting diverse emphases on these sub-processes within the conventional machine learning domain for classifying WBCs.

Pre-processing-based ML techniques.
Pre-processing-based techniques include methods that manipulate and enhance raw data prior to further analysis.In the context of WBC classification, these techniques play a critical role in refining images to enable accurate categorization.Rosyadi et al. [19] used optical microscopy to generate blood samples images, with their method comprising four stages: image pre-processing, segmentation, feature extraction, and classification.In the first phase of image pre-processing, images were transformed from RGB to grayscale and binary images.Subsequently, in the second phase, resizing, cropping, and edge detection were applied to all images.Five geometrical features were considered in the feature extraction phase that represent important geometric characteristics of the segmented cells: normalized area, solidity, eccentricity, circularity, and normalized perimeter.These characteristics help differentiate various types of WBCs and enable accurate classification through K-means clustering.The study focus was analysis of each feature for accuracy.After experimentation, it was concluded that the circularity feature was most significant as it achieved the highest accuracy (67%), with the eccentricity feature having the lowest accuracy of 43%.Gautam et al. [20] also presented a technique initiated via pre-processing of microscopic images.Pre-processing involved conversion of RGB (Red, Green, Blue) images to grayscale, contrast stretching, and histogram equalization.Subsequently, they applied segmentation through Otsu's thresholding method, followed by geometrical feature extraction, including perimeter, area, eccentricity, and circularity.Finally, a Naïve Bayes classifier was used for classification with the maximum likelihood method, achieving 80.88% accuracy.Savkare et al. [21] presented an alternative method for blood cell segmentation; their pre-processing approach employed median and Laplacian filters to enhance image quality.After pre-processing, images were transformed from RGB to HSV (Hue, Saturation, Value) color space.Subsequently, Kmean clustering was applied for segmentation of blood cells.Furthermore, they used morphological operation and a watershed algorithm to refine cell separation.The proposed method through K-mean clustering acquired an accuracy of 95.5%.

Feature extraction-based ML techniques.
Typically, a differential counting method of WBCs is used to assess a patient's immune system.This method involves using flow cytometry and fluorescent markers, which may disturb the cell due to repetitive sample preparation.Accordingly, label-free techniques that use imaging flow cytometry and ML algorithms to classify unstained WBCs are considered a more effective approach.Toh et al. [22] previously reported a mean F1-score of 97% across B and T subtypes, with each individual subtype achieving a distinct F1 score of 78%.Tsai et al. [23] proposed a multi-class support vector machine (SVM) approach to hierarchically identify and categorize blood cell images; segmentation was implemented on digital images to retrieve geometric features from each segment, enabling identification and classification of different blood cell types.The experimental outcomes were compared with manual results, revealing that the proposed method significantly outperformed manual classification with an accuracy of 95.3%.Likewise, Şengu ¨r et al. [24] presented a model combining image processing (IP) and ML techniques for WBC classification.Shape-based features and deep features were utilized to describe WBCs, with a long-shortterm memory (LSTM) model applied to a dataset comprising 349 blood smear images with 10-fold cross-validation, from which 35 geometric and statistical features were extracted.More recently, Elen and Turan [25] compared six ML techniques (decision tree classifier, Random Forest, K-Nearest Neighbor, Multinomial Logistic Regression, Naïve Bayes, and SVM) for WBC categorization.Using shape-based features, an accuracy of 80% was achieved, while deep features achieved 82.9% accuracy.Overall, Multinomial Logistic Regression returned the highest precision rate of 95%, followed by Random Forest.
Huang et al. [26] presented a technique for WBC segmentation, delineating their approach into three phases: nucleus segmentation and recognition, feature extraction, and classification.A leukocyte (WBC) nucleus enhancer (LNE) was used to enhance the contrast of nucleus colors for segmentation, after which, multiple levels of Otsu's thresholding were applied, effectively preserving only the WBCs and suppressing other cell types.During the feature extraction phase, a gray-level co-occurrence matrix was employed from which 80 texture features were extracted.Subsequently, they incorporated shape-based features, including compactness and roughness, after which Principal Component Analysis (PCA) was used to reduce feature dimensions.Classification was achieved using a genetic-based parameter selector (GBPS) with 50X cross-validation, resulting in 95% classification accuracy.Yampri et al. [27] also segmented out the WBCs via automatic thresholding (i.e., segregation of cell nucleus from cytoplasm) and feature extraction.Eigen cells were used to remove segments by applying the following approach: conversion of cell image to vector, computation of mean and covariance of vector, computation of eigen values and eigen vectors.Principle component analysis (PCA) was used to transform high dimensional eigen space to significantly lower dimensional space, with 92% classification accuracy achieved.

Classification-based (focused) ML techniques.
Tavakoli et al. [28] developed a three-phase ML method for improved WBC classification delineated as follows: nuclei/cytoplasm detection, extract features, and classification.A novel process was designed to segment the entire nucleus, while cytoplasm segmentation involves location detection inside the convex region.In the next phase, four unique colors and three shape features were extracted, and finally, in the last phase, SVM was used for WBC classification.Overall, 94.2% accuracy was achieved on the BCCD dataset, 92.2% with LISC dataset, and 94.65% with the Raabin-WBC datasets, however, hyperparameter issues were encountered.
An innovative "Computer-aided diagnostic system" method was proposed by Malkawu et al in 2020 with this process utilizing a hybrid approach, whereby CNN was employed as a feature extractor.The performance of several classifiers was measured, with Random Forest (RF) outperforming other classifiers based on a 98.7% accuracy [29].A similar multi-approach (i.e., comparison of several ML algorithms) by Gupta et al. [30] presents an optimized form of the Binary Bat algorithm inspired by bat echolocation techniques.Using OBBA (Table 3), dimensionality reduction was achieved by eliminating �11 similar features.Four classifiers (KNN, Logistic Regression, RF, and DT) were applied for WBC classification, demonstrating highest performance, with a mean accuracy of 97.3%, thereby surpassing other optimizers like the Optimized Crow Search Algorithm (OCSA), which attained an accuracy of 92.8% and the Optimized Cuttlefish Algorithm (OCFA), with an accuracy of 95.2%.
Lee [31] proposed an innovative approach to image segmentation based on grey-level thresholding, based on previous findings that cell-type specific reaction of the cells produces adequate evidence to allow precise classification.This method was tested on a dataset comprising 1149 WBCs from 13 altered, clinically significant categories.Cells were randomly selected from 20 blood smear images obtained from leukemia patients, with cell sorting based on quantitative volumes in the segmented images producing a classification accuracy of 82.6%.
3.3 White blood cell classification using deep learning techniques.Wibawa et al. [32] proposed a DL model for classifying two WBC types, comparing the results with conventional machine learning methods (support vector machines), using nine features for classification.The authors report that deep learning significantly exceeded conventional ML methods, achieving highest accuracy of 95.5%.Toğac ¸ar [33] introduced a WBC classification approach based on the coefficient and ridge feature selection method utilizing a CNN model with Goo-gleNet and ResNet50 for feature extraction.They achieved 97.95% accuracy for WBC classification and counting.Likewise, CNN was employed to identify and classify segmented WBC images as being "granular" or "non-granular".Subsequently, granular cells were further categorized into eosinophils and neutrophils, while non-granular cells were classified as lymphocytes and monocytes [34].To enhance dataset robustness, augmentation approaches were implemented, resulting in improved accuracy for both binary and multi-classification of blood cell subtypes, leading to 98.51% precision for binary WBC classification and 97.7% precision for subtype classification.
Lippeveld et al. [35] employed a relatively small dataset to examine human blood samples using image flow cytometry, with two models used to identify eight WBC types and eosinophils exclusively.ML models were applied to both datasets to classify human blood cells with 5-fold cross validation.Random Forest (RF) and Gradient Boosting (GB) were used for the first model, while deep learning CNN architecture (ResNet and DeepFlow (DF)) were employed for the second model.On the WBC dataset, results demonstrated a relatively balanced accuracy of 77.8% and 70%, while similarly for the eosinophil dataset, a balanced accuracy of 87.1% and 85.6% was achieved.DF outperformed the RN architecture on the WBC dataset, acquiring a classification accuracy of 70.3% compared to RN's 64.9%.
Rawat et al. [36] introduced another deep learning method employing the DenseNet121 model for classification of several WBC types-The proposed model was estimated, with an accuracy of 98.84%.Results indicate that the DenseNet121 model with a batch size of 8 exhibited the highest overall performance.The dataset, consisting of 12,444 images, was obtained from Kaggle.Nazlibilek et al. [37] proposed a DL-based method that leveraged image variation operations and generative adversarial networks (GAN) for accurately classifying WBCs into five distinct types.Likewise, Sadeghian et al. [38] developed a two-stage model comprising an initial alteration using a pre-trained model, followed by the integration of a ML classifier.They employed the BCCD dataset, a downscaled blood cell detection dataset, and achieved a precision of 97.03%.Likewise, Sadeghian et al. [38] developed a two-stage model comprising an initial alteration using a pre-trained model, followed by the integration of a ML classifier.They reported 97.03% classification accuracy on the BCCD dataset, a downscaled blood cell detection dataset.Macawile et al. [39], utilized Convolutional Neural Networks (CNNs) to effectively classify and count WBCs in microscopic blood images.Among the proposed models AlexNet, GoogleNet, and ResNet-101.AlexNet performed better than the other two.It demonstrated an overall accuracy of 96.63%, albeit with a relatively lower sensitivity rate of 89.18%.
Liang et al. [40] introduced an innovative approach that merges convolutional neural networks (CNNs) with recurrent neural networks (RNNs).This fusion, termed the CNN-RNN framework, enhances understanding of image content and structured feature learning, enabling end-to-end training for comprehensive medical image data analysis.They applied transfer learning, adapting pre-trained weight parameters from the ImageNet dataset for the CNN segment.Additionally, a customized loss function was integrated to expedite training and achieve precise weight parameter convergence.Experimental results indicate a classification accuracy of 90.79%.More recently, Sharma et al. [41] presented yet another CNN-based classification methodology, achieving an impressive 96% accuracy for binary classification and 87% accuracy for multiclass classification.
Togacar et al. [42] employed a very different DL approach to WBC classification by using a computer-aided automated approach.Utilizing Regionally Based Helixal Neural Networks, their study effectively classified and differentiated WBCs, achieving an objectively high level of classification accuracy (99.52%).Toğac ¸ar et al. [33] also introduced a method composed of three essential phases.In the initial stage, CNN models specifically AlexNet, GoogleNet, and ResNet-50 are utilized as feature extractors.Subsequently, the features extracted from these CNN model layers are fused.In the second phase, the technique incorporates feature selection methods, including MIC and Ridge Regression.In the third phase, these selected features are amalgamated.The overlapping features derived from the MIC and Ridge Regression techniques are then classified using the QDA method.This integrated approach achieves a remarkable overall success rate of 97.95% in classifying WBCs.
Mohamed [43] introduced an alternative method for the identification and classification of blood cells based on CNN.The study presented two distinct approaches for classifying WBCs.In the initial approach, CNN was employed with transfer learning, utilizing pre-trained weight parameters applied to the images.In contrast, the second approach utilized Support Vector Machines (SVM) for the classification process.The classification results demonstrated a remarkable 98.4% accuracy for CNN and 90.6% accuracy for SVM.The classification results of CNN are higher compared to SVM.Yao et al. [44] introduced a CNN-based approach for the classification of WBCs.In their method, CNN integrated an optimizer to adaptively adjust parameters such as the learning rate, leveraging the efficient net architecture.The utilization of the optimizer responded to changes in loss and accuracy.Their proposed model demonstrated exceptional performance, achieving an impressive accuracy of 90%.
Khosrosereshki et al. [45] developed an R-CNN-based model to identify neutrophils, eosinophils, monocytes, and lymphocytes, with two models employed, namely Faster RCNN and Yolov4.Faster RCNN obtained an accuracy of 96.25%, while Yolov4 was slightly lower at 95%.Likewise, Bouchet et al. [46]  Ullah et al. [48] introduced a 3D-CNN feature-based CBVR system that is highly efficient and effective for retrieving similar content from vast video data repositories.After an in-depth exploration of its effectiveness in representing sequential frames, they selected middle layer features of a 3D-CNN model.Leveraging a mechanism for selecting convolutional features, only the active feature maps from the CNN layer that correspond to the ongoing event in the frame sequence are chosen.To condense the size of the extracted high-dimensional features for streamlined retrieval and expedited storage, they introduced the concept of hashing.These high-dimensional features are represented in compact binary codes through PCA, ensuring efficient search and reduced storage requirements for WBCs classification.For the classification of WBCs, the achieved accuracy is 85%.
Imran et al. [49] conducted a study involving the utilization of a four-hidden-layer feed-forward DNN and CNN.The research also extensively examines the impact of Mel-Frequency Cepstral Coefficients (MFCC) and Filter Bank Energies (FBE)features trained with various context sizes on two deep learning models, evaluated under normal, slow, and fast speaking rates.Micro-level analysis of results was conducted, revealing that the four-hidden-layer CNN slightly outperforms the DNN in classifying WBCs.The CNN achieved an accuracy of 83% in classifying WBCs.Kastrati et al. [50] introduced a convolutional neural network with three hidden layers, each having 1024 neurons, showcasing excellent performance in white blood cell classification on the INFUSE dataset, achieving accuracy of 78.10%.
Ullah et al. [51] introduced an innovative conflux Long Short-Term Memory (LSTM) network for WBC classification.The framework involves four stages: 1) frame-level feature extraction, 2) feature propagation through the conflux LSTM network 3) pattern acquisition and correlation computation, and 4) action classification.The process begins with extracting deep features using a pre-trained VGG19 CNN model from frame sequences for each view.Extracted features then undergo conflux LSTM processing to learn unique view-specific patterns.Interview correlations are computed by utilizing pairwise dot products from LSTM outputs across views, thus acquiring interdependent patterns.The VGG19 CNN model achieved a classification accuracy of 88.9%.Meanwhile, Banik et al. [52] recently presented a CNN)-based WBC image classifier which merges features from both the initial and final convolutional layers, while utilizing input image propagation through a convolutional layer to enhance performance.A dropout layer is added to counter overfitting, resulting in a classification accuracy of 98.61%.Another CNN-based approach has been developed by Ku et al. [53] who propose an automated system for leucocyte classification using a dual-stage CNN.A dataset of 2,174 patch images was collected for training and testing purposes, with the dual-stage CNN used to classify images into 4 classes, achieving an overall accuracy of 97.06%.
Karthikeyan et al. [54] introduced the Leishman-stained function deep classification (LSM-TIDC) model for WBC classification.Interestingly, the LSM-TIDC method explores the potential of interpolation and Leishman-stained function without the need for explicit segmentation, which if successfully implemented, effectively eliminates false regions in multiple input images.Following image pre-processing, relevant features are extracted through multidirectional feature extraction, with a system then developed, utilizing a transformation invariant model to extract nuclei and subsequently employing convolutional and pooling characteristics for cell classification.Method testing was conducted on the Kaggle dataset, and classification accuracy of 94.42% was achieved.
Upon comparing the identified 136 relevant studies, as a general observation, detection of WBCs through conventional methods (ML) tends to focus on cell segmentation after data preprocessing, with segmented data then typically employed for feature extraction in WBC classification.Accordingly, the traditional ML methods were associated with better results as accurate identification of WBCs is impractical in the absence of efficient segmentation, thus resulting in higher levels of classification accuracy (Tables 4-8).Research teams employed a range of methods for data segmentation and obtained a range of classification accuracies; while some conventional models achieved up to 99% accuracy, accuracy was shown to decrease in concurrence with decreasing dataset size (e.g., Lippeveld et al. [35]).Deep learning models exhibited improved performance for more extensive datasets and exhibited higher levels of accuracy in concurrence with increasingly large datasets (Table 6).Several authors implemented a combination of different datasets, to probe the accuracy of their models on unknown datasets (i.e., blind testing).Deep learning models have represented a significant breakthrough in myriad domains and as shown in the identified literature, the use of traditional machine learning models within biomedical applications in general, and WBC classification in particular is undoubtedly shifting toward the use of deep learning models based on dataset size.However, deep learning algorithms (and associated research) are now in a significantly more advanced phase, with proven capacity to solve increasingly complex problems with higher performance.Notwithstanding, there is a clear gap in the use of the latest advances in deep learning, including the use of transfer knowledge and meta-learning processes.
Comparative analysis of deep learning models applied to various large datasets revealed remarkably high levels of achieved accuracy across various studies (Table 9).Baghel et al. [97] demonstrated a high level of efficacy associated with the use of CNNs, achieving an accuracy of 98.51%, while Riaz et al. [117] used a Convolutional Generative Adversarial Network (GAN) to obtain a classification accuracy of 99.9% on the Catholic University of Korea dataset.Mosabbir et al. [118] addressed the challenging National Institutes of Health (NIH) dataset using CNN, attaining an accuracy of 97.92%.Tusneem et al. [119] also used CNN and demonstrated its strength, with a 99.7% classification accuracy.Kakumani et al. [120] utilized a pretrained InceptionV3 model on the Kaggle dataset and achieved 99.76% classification accuracy.

Limitations of previous studies and future challenges
ML/DL researchers have made significant advances in increasingly accurate classification of WBCs in recent years.Among all techniques based on SVM, Sajjad et al. [6] achieved maximum accuracy, sensitivity, and specificity of 98.6%, 96.2%, and 98.5%.Using KNN, Abdeldaim et al. [68] achieved maximum accuracy of 98.6%.Similarly, using ANN, Hegde et al. [70] acquired accuracy, sensitivity, and specificity of 99%, 99.4%, and 99.18%, respectively.Using DL methods, Loey et al. achieved maximum precision and sensitivity of 100% each and  [96,104], with unsupervised or semi-supervised systems needed to address these issues [105].Moreover, TML and DL-based MIA applications and systems still have significant work to adopt "real-time application".

Lack of publicly accessible datasets
The lack of publicly accessible datasets represents the primary issue affecting medical image analysis.Scientists need to inspire health organizations to address this problem, it would be beneficial if high-quality data were available to researchers.Initiatives promoting open data availability from various health organizations worldwide should also be encouraged.However, authorization should also be required (e.g., hospital data and conditional access to datasets).When data are readily available in large quantities, just like in other fields such as environmental science, weather forecasting, and bioinformatics, the issue becomes more relevant for research (e.g., video summarization [106], IoT [107], energy management [108], and so on).
Acquiring very large, high-quality datasets with accurate labeling is crucial for MIA applications.

Generalization skills for trained predictors
Another very significant challenge associated with MIA and WBC identification and classification is the availability of appropriately trained predictors.A perfect learning method that balances computational efficiency with generalization capacity is required to solve this issue.To build a model with impressive generalization capabilities, a learning approach that incorporates true or random labels is necessary.This approach provides efficient training algorithms and practical tools to handle available datasets using accurate or arbitrary labels.Many MIA tasks, including identifying brain tumors, lung cancer, breast cancer, and leucocytes, have shown significant empirical success.Despite the inherent challenges posed by non-convex optimization, basic techniques such as stochastic gradient descent (SGD) can efficiently discover viable solutions, effectively minimizing training errors.More interestingly, the networks created in this manner have strong generalization capabilities [109], even when there are far more parameters than training data [110].Only reducing the training error during model training is insufficient.The choice of global minima greatly impacts the generalization

Reliable methods for real-world scenarios
TML and DL approaches provide reliability to real-world health diagnosis systems [111].However, MIA and leukocyte classification models requires expertise and technical skill.In the future, researchers should prioritize crafting accurate and trustworthy procedures applicable in real-world healthcare situations, eliminating the necessity for medical specialists.Realworld health diagnosis systems greatly gain from the dependability of Machine Learning (ML) and Deep Learning (DL) approaches [111].Yet, constructing exact models for Medical Image Analysis (MIA) and leukocyte classification necessitates a high degree of expertise and technical proficiency.As research advances, it becomes crucial for researchers to tackle the task of developing reliable procedures that can smoothly integrate into real-world healthcare environments, reducing the reliance on specialized medical professionals.This involves tackling issues related to model generalization, data variability, interpretability, and ensuring consistent performance across diverse patient populations and clinical scenarios.

Future research directions
The biomedical engineering and research community should dedicate substantial effort to support MIA, particularly leukocyte examination in blood images, due to the significant challenges faced by the MIA community, as detailed in section V. i.Data augmentation methods to complete the dataset deficit.This work addresses the issue of limited dataset availability in MIA and leucocyte classification.We present data augmentation approach and leverage transfer learning algorithms to enhance the identification of WBCs.
ii. Technical skills and medical experience required.TML and DL models have shown significant potential for computer-aided MIA-based diagnostic applications, and popular open-source frameworks like TensorFlow, Caffe, and Keras offer access to these advanced models [121].Developing effective machine learning models for medical image analysis (MIA) requires careful consideration and expertise in the clinical and medical domains.It is essential to choose and train the suitable model to achieve accurate and reliable results in MIA applications.
iii.Resource-aware DL models for classifying leukocytes.Medical Image Analysis (MIA) with the adoption of advanced DL models like GANs, R-CNN, Fast R-CNN, and faster R-CNN, along with the integration of TML and DL methods.These models have shown superior performance in tasks like brain tumor detection, leukocyte classification, breast cancer diagnosis, and various other MIA applications.However, their biggest concerns are the significant memory needs and computational costs.Therefore, it is necessary to investigate the computationally and environmentally friendly TML and DL models for leukocyte analysis in blood images.

iv. Models for the detection and classification of leukocytes
DNNs provide a superior alternative to conventional learning techniques.The end-to-end models, especially CNNs, stem from their efficient process and the capability to classify leucocytes into five classes.These models compete with complex MIA models built on DNN based on data-driven learning methodologies.WBC detection and categorization in images can also be accomplished using a variety of end-to-end designs [122][123][124].

v. TML AND DL universal evaluation in MIA
The MIA research community often relies on subjective evaluation methods, which can be challenging, inefficient, and prone to errors.Therefore, comprehensive evaluation techniques that can automatically assess the effectiveness of Traditional Machine Learning (TML) and Deep Learning (DL) models for MIA from various views.

vi. Vision Transformers and Vision Formation Models
While Vision Transformers (ViTs) were not included in the current review, they represent a likely cutting-edge approach for the future of white blood cell (WBC) classification (and other forms of imagery analyses), employing an advanced self-attention mechanism to extract crucial features from input images.Additionally, ViTs leverage transfer learning by incorporating pre-trained model weights, further boosting their performance.This dual approach meticulously captures subtle features, significantly enhancing the precision and accuracy of WBC classification-a major advancement in the realm of medical imaging.Likewise, vision foundation models are powerful generative deep learning models trained on large datasets for classification, segmentation, and detection, and will likely become a frequently employed approach for medical imaging in future.

Conclusion
We provide a comprehensive review of the TML and DL techniques applied to WBCs classification.We thoroughly explored and compared various methods for WBC categorization in this context.The data for this research is compiled from 136 primary papers published between 2006 and 2023.These papers encompass TML and DL methodologies for leukocyte classification and their applications in medical diagnosis.The comprehensive analysis of these studies reveals the significant contributions of TML and DL techniques to MIA.The main objective of this work is to identify and synthesize the myriad TML and DL applications in MIA, particularly in the domain of leucocyte classification in blood smear images.This research aims to provide valuable insights into the complex characteristics of TML and DL in MIA by thoroughly analyzing existing literature.Based on literature review outcomes, Deep Learning models like CNNs for image classification and GANs for data augmentation should be increasingly employed to negate the limitations (e.g., time) and human biases/inaccuracies associated with manual classification used.The study's results emphasize the importance of conducting more research on using TML and DL methods effectively in MIA and classifying leucocytes in blood smear images.Besides leucocyte classification, this study explored applications for advanced DL models.Collecting all these data in this study will help the research industry by indicating where they should focus their future investigation of TML and DL models for MIA.These methods have the potential to lead to significant advancements in speech analysis, natural language processing (NLP), and medical imaging in the future.In addition to WBCs, TML and DL approaches are employed to identify and categorize various MIA domains, such as the analysis of MRI, CT, X-ray, and ultrasound images.Blood smear images are a growing field in MIA that has drawn attention from the research community over the past three decades.Additionally, we recognized the problems, instructions, and solutions for the developments of TML and DL models in MIA, notably for classifying WBCs in blood smear images.The potential of TML and DL approaches will be used to expand our research to include different MIA domains, including MRI, CT, Ultrasound, and X-ray images.

Fig 5 .Fig 6 .
Fig 5. Identified articles delineated by country of origin (based on first and corresponding authors and origin of study dataset).https://doi.org/10.1371/journal.pone.0292026.g005 utilized the Inception Recurrent Residual Convolutional Neural Network (IRRCNN) model, an advanced hybrid architecture based on residual networks and RCNN principles.The proposed IRRCNN demonstrated exceptional accuracy in experiments, achieving a 100% accuracy rate for WBC classification.Jha et al. [47] developed a leukemia detection module specifically designed for blood smear images with their multi-phase detection process comprising pre-processing, segmentation, feature extraction, and classification.The segmentation step utilizes a hybrid model based on Mutual Information (MI), which combines results from the active contour model and fuzzy C means algorithm.Subsequently, statistical and Local Directional Pattern (LDP) features are extracted from the segmented images.These features are then fed into a novel Deep CNN classifier based on the proposed Chronological Sine Cosine Algorithm (SCA) for classification purposes.Testing used blood smear images from the AA-IDB2 database, with simulation results indicating that the developed classifier achieved an accuracy of 98.7%.

S2
Checklist.Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) checklist.

Table 8 . Deep learning model accuracy, specificity and sensitivity for WBC classification (n = 36).
However, while many researchers achieved close to maximum performance, several limitations and constraints have been associated with previous and current techniques.Accordingly, the research community faces several fundamental obstacles in the field of MIA that must be accepted and resolved.These include the lack of easily accessible, large, high-quality datasets, a shortage of dedicated medical professionals, and the complexity of Transfer Learning and Deep Learning methods.Several DML strategies, mathematical and theoretical foundations are also a source of several challenges

Table 9 . Comparative analysis of best performing (based on reported accuracy) deep learning models (n = 8).
behavior of the predictor.It is crucial to select the appropriate algorithm to minimize training errors for better results.Different initialization, update, learning rate, and halting conditions for optimization algorithms will result in global minima with various degrees of generalizability.